Load the necessary libraries
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(DHARMa) #for residual diagnostics
library(performance) #for residuals diagnostics
library(see) #for plotting residuals
library(sjPlot) #for outputs
library(knitr) #for kable
library(effects) #for partial effects plots
library(ggeffects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(modelr) #for auxillary modelling functions
library(tidyverse) #for data wrangling
Here is an example from Fowler, Cohen, and Jarvis (1998). An agriculturalist was interested in the effects of fertilizer load on the yield of grass. Grass seed was sown uniformly over an area and different quantities of commercial fertilizer were applied to each of ten 1 m2 randomly located plots. Two months later the grass from each plot was harvested, dried and weighed. The data are in the file fertilizer.csv in the data folder.
| FERTILIZER | YIELD |
|---|---|
| 25 | 84 |
| 50 | 80 |
| 75 | 90 |
| 100 | 154 |
| 125 | 148 |
| ... | ... |
| FERTILIZER: | Mass of fertilizer (g.m-2) - Predictor variable |
| YIELD: | Yield of grass (g.m-2) - Response variable |
The aim of the analysis is to investigate the relationship between fertilizer concentration and grass yield.
fert = read_csv('../data/fertilizer.csv', trim_ws=TRUE)
## Parsed with column specification:
## cols(
## FERTILIZER = col_double(),
## YIELD = col_double()
## )
glimpse(fert)
## Rows: 10
## Columns: 2
## $ FERTILIZER <dbl> 25, 50, 75, 100, 125, 150, 175, 200, 225, 250
## $ YIELD <dbl> 84, 80, 90, 154, 148, 169, 206, 244, 212, 248
## Explore the first 6 rows of the data
head(fert)
str(fert)
## tibble [10 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ FERTILIZER: num [1:10] 25 50 75 100 125 150 175 200 225 250
## $ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
## - attr(*, "spec")=
## .. cols(
## .. FERTILIZER = col_double(),
## .. YIELD = col_double()
## .. )
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i = \beta_0 + \beta_1 x_i \]
where \(y_i\) represents the \(i\) observed values, \(\beta_0\) and \(\beta_1\) represent the intercept and slope respectively, and \(\sigma^2\) represents the estimated variance.
ggplot(fert, aes(y=YIELD, x=FERTILIZER)) +
geom_point() +
geom_smooth()
ggplot(fert, aes(y=YIELD, x=FERTILIZER)) +
geom_point() +
geom_smooth(method='lm')
ggplot(fert, aes(y=YIELD)) +
geom_boxplot(aes(x=1))
fert.lm<-lm(YIELD~1+FERTILIZER, data=fert)
fert.lm<-lm(YIELD~FERTILIZER, data=fert)